The Annals of Statistics 

2012, Vol. 40, No. 4, 1989-1996 

DOI: 10.1214/12-AOS985 

Main article DOI: 10. 1214/11- AOS949 

© Institute of Mathematical Statistics, 2012 

DISCUSSION: LATENT VARIABLE GRAPHICAL MODEL 
SELECTION VIA CONVEX OPTIMIZATION 1 

By Zhao Ren and Harrison H. Zhou 
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1. Introduction. We would like to congratulate the authors for their 
refreshing contribution to this high-dimensional latent variables graphical 
model selection problem. The problem of covariance and concentration ma- 
trices is fundamentally important in several classical statistical methodolo- 
gies and many applications. Recently, sparse concentration matrices esti- 
mation has received considerable attention, partly due to its connection to 
sparse structure learning for Gaussian graphical models. See, for example, 
Meinshausen and Biihlmann (2006) and Ravikumar et al. (2011). Cai, Liu 
and Zhou (2012) considered rate-optimal estimation. 

The authors extended the current scope to include latent variables. They 
assume that the fully observed Gaussian graphical model has a naturally 
sparse dependence graph. However, there are only partial observations avail- 
able for which the graph is usually no longer sparse. Let X be (p-f-r)-variate 
Gaussian with a sparse concentration matrix S^q H ^ . We only observe Xo , p 

out of the whole p-\-r variables, and denote its covariance matrix by Eq. In 
this case, usually the px p concentration matrix (S^)" 1 are not sparse. Let 
S* be the concentration matrix of observed variables conditioned on latent 
variables, which is a submatrix of S? H ^ and hence has a sparse structure, 

and let L* be the summary of the marginalization over the latent variables 
and its rank corresponds to the number of latent variables r for which we 
usually assume it is small. The authors observed (Sq) _1 can be decomposed 
as the difference of the sparse matrix S* and the rank r matrix L* , that is, 
(Sq) _1 = S* — L* . Then following traditional wisdoms, the authors natu- 
rally proposed a regularized maximum likelihood approach to estimate both 
the sparse structure S* and the low-rank part L*, 

min tv((S - L)Eg) - logdet(5 — L) + X n(ri\\S\\i + tr(L)), 

(S,L):S-L>-0,L^0 
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where Y!q is the sample covariance matrix, ||5||i =Yli j\ s ij\-> an< ^ 7 an< ^ 
Xn are regularization tuning parameters. Here tr(L) is the trace of L. The 
notation Ay means A is positive definite, and A y denotes that A is 
nonnegative. 

There is an obvious identifiability problem if we want to estimate both 
the sparse and low-rank components. A matrix can be both sparse and low 
rank. By exploring the geometric properties of the tangent spaces for sparse 
and low-rank components, the authors gave a beautiful sufficient condition 
for identifiability, and then provided very much involved theoretical justi- 
fications based on the sufficient condition, which is beyond our ability to 
digest them in a short period of time in the sense that we don't fully under- 
stand why those technical assumptions were needed in the analysis of their 
approach. Thus, we decided to look at a relatively simple but potentially 
practical model, with the hope to still capture the essence of the problem, 
and see how well their regularized procedure works. Let || • denote the 

matrix l\ norm, that is, 1 1 ^ 1 1 1 = maxi<j< p X^y=i l s *il- We assume that S* 

is in the following uniformity class: 

U(s (p),M p ) = ls = (s tJ ) :SyO, \\S\\i^t < M p , 

max V l{ Sij / 0} < s (p) > , 
x - l - v ^i J 

where we allow so(p) and M p to grow as p and n increase. This uniformity 
class was considered in Ravikumar et al. (2011) and Cai, Liu and Luo (2011). 
For the low-rank matrix L* , we assume that the effect of marginalization 
over the latent variables spreads out, that is, the low-rank matrix L* has 
row/column spaces that are not closely aligned with the coordinate axes to 
resolve the identifiability problem. Let the eigen-decomposition of L* be as 
follows: 

r (p) 

(2) L *=Y1 

i=l 

where tq(p) is the rank of L* . We assume that there exists a universal con- 
stant Co such that ||iii||oo < v/^ ^ or an anc ^ ll-^*lli->-i i s bounded by M p 
which can be shown to be bounded by cqTq. A similar incoherence assump- 
tion on Ui was used in Candes and Recht (2009). We further assume that 

(3) A max (S5)<M and A min (S5) > 1/M 

for some universal constant M. 

As discussed in the paper, the goals in latent variable model selection 
are to obtain the sign consistency for the sparse matrix S* as well as the 
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rank consistency for the low-rank semi-positive definite matrix L* . De- 
note the minimum magnitude of nonzero entries of S* by 9, that is, 9 = 
minjj |sjj|l{sij 7^ 0}, and the minimum nonzero eigenvalue of L* by <r, that 
is, a = mini<j< ro Aj. To obtain theoretical guarantees of consistency results 
for the model described in (1), (2) and (3), in addition to the strong irrep- 
resentability condition which seems to be difficult to check in practice, the 
authors require the following assumptions (by a translation of the conditions 
in the paper to this model) for 9, a and n: 

(1) > y/p/n, which is needed even when so(p) is constant; 

(2) a > s o(p)\/p/ n under the additional strong assumptions on the 
Fisher information matrix E^ (g> E^ (see the footnote for Corollary 4.2); 

(3) n>sl(p)p. 

However, for sparse graphical model selection without latent variables, ei- 
ther the l\ -regularized maximum likelihood approach [see Ravikumar et al. 
(2011)] or CLIME [see Cai, Liu and Luo (2011)] can be shown to be sign con- 
sistent if the minimum magnitude nonzero entry of concentration matrix 9 is 
at the order of W (log p)/n when M p is bounded, which inspires us to study 
rate-optimalites for this latent variables graphical model selection problem. 
In this discussion, we propose a procedure to obtain an algebraically consis- 
tent estimate of the latent variable Gaussian graphical model under a much 
weaker condition on both 9 and a. For example, for a wide range of sq(p), we 
only require 9 is at the order of y (log p)/n and a is at the order of \Jpjn to 
consistently estimate the support of S* and the rank of L* . That means the 
regularized maximum likelihood approach could be far from being optimal, 
but we don't know yet whether the suboptimality is due to the procedure 
or their theoretical analysis. 

2. Latent variable model selection consistency. In this section we pro- 
pose a procedure to obtain an algebraically consistent estimate of the latent 
variable Gaussian graphical model. The condition on 9 to recover the sup- 
port of S* is reduced to that in Cai, Liu and Luo (2011) which studied sparse 
graphical model selection without latent variables, and the condition on a 
is just at an order of yp/n, which is smaller than s^(p)^/p/n assumed in 
the paper when sq(p) — > 00. When M p is bounded, our results can be shown 
to be rate-optimal by lower bounds stated in Remarks 2 and 4 for which we 
are not giving proofs due to the limitation of the space. 

2.1. Sign consistency procedure of S* . We propose a CLIME-like esti- 
mator of S* by solving the following linear optimization problem: 

min||5||i subject to ||EgS - I||oo < r„, S € M pxp , 

where Eq = (<7ij) is the sample covariance matrix. The tuning parameter r n 

is chosen as r n = C\M p \l for some large constant C\. Let S\ = (sj-) be 
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the solution. The CLIME-like estimator 5 = (sy) is obtained by symmetriz- 
ing Si as follows: 

In other words, we take the one with smaller magnitude between sjj and ij^. 
We define a thresholding estimator 5 = (sjj) with 

(4) % = s^-lflsyl > 9M p r n } 

to estimate the support of S* . 

Theorem 1. Suppose that S* eU(s (p),M p ), 



(5) \J Qogpf/n = o(l) and ||L* < M p r n . 

With probability greater than 1 — C s p~ e for some constant C s depending on 
M only, we have 

\\S- 5*1(00 <9M p r„. 

Hence, if the minimum magnitude of nonzero entries > 18M p r n , we ob- 
tain the sign consistency sign(S) = sign(5*). In particular, if M p is in the 
constant level, then to consistently recover the support of S* , we only need 
that 6 x \J (logp) jn. 

Proof. The proof is similar to Theorem 7 in Cai, Liu and Luo (2011). 
The sub-Gaussian condition with spectral norm upper bound M implies that 
each empirical covariance ofy- satisfies the following large deviation result: 

]P(|S\j — > t) < C s exp^— -^nt^j for \t\ < <fi, 

where C S ,C2 and (f> only depend on M. See, for example, Bickel and Lev- 
ina (2008). In particular, for t = Ci\J (log p)/n which is less than <j) by our 
assumption, we have 

(6) P(||E5-ES||oo>t)<^P(|5i i -ai i | >t)<p 2 -C sP ~ 8 . 
Let 



A = {\\E* - SSIloo < <W(logp)M. 
Equation (6) implies P(A) > 1 — C s p~ G . On event A, we will show 

(7) -L*)-Si Hoc <8M p r n , 

which immediately yields 

\\S* - S||oo < ||(5* - V) - & I!*, + ||L*||oo < 8M pTn + M p r n = 9M p T n . 
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Now we establish equation (7). On event A, for some large constant C\ > 
2C*2, the choice of r n yields 

(8) 2M p \\Z* -Y.l\\ 00 <T n . 

By the matrix l\ norm assumption, we could obtain that 

(9) IKE^r 1 !!^! < ||S*||i_n + ||i*||i^i < 2M p . 
From (8) and (9) we have 

|| E « (5* _ L *) - = ||(SS - E^E^U 

— II ~~ II oo || (^o) I ||l->1— T ni 

which implies 

1 1 \in / o* r * ^ o 1 1 

(10) 

< ||Eq(S* - L*) — 7||oo + USqiSi — /Hoc < 2r n . 
From the definition of S\ we obtain that 
(11) ||^i||i-n<||S*-L*||i_,i<2M p , 
which, together with equations (8) and (10), implies 

||£5(0s*-£*)-Si)|L 

< \\Z n (S* - L*) - S4oo + ||(S2, - Eg)((5* - L*) - Si)IL 

< 2r n + ||Eq - EqHooIKS* - L*) - Si||i^i 
<2r n + 4M p ||ES-S5|| 00 <4r n . 

Thus, we have 

_ L *) _ 3,11^ < ||(E5)- 1 || 1 ^ 1 ||E5((5* - L*) - Si)^ < 8M p r n . □ 

Remark 1. By the choice of our r n and the eigen-decomposition of 
L*, the condition ||-L*||oo < M p r n holds when ro(p)Co/p < C\ M p \J (log p)/n, 
that is, logp > nrQ(p)M~ A . If M p is slowly increasing (e.g., p 1//4_T for any 
small r > 0), the minimum requirement 6 x ^/ (logp)/n is weaker than 

^ ^ \/p/ n required in Corollary 4.2. Furthermore, it can be shown that the 
optimal rate of minimum magnitude of nonzero entries for sign consistency 
is 9 x M p (log p)/n as in Cai, Liu and Zhou (2012). 

Remark 2. Cai, Liu and Zhou (2012) showed the minimum requirement 
for 8, 9 x M V \J (logp)/n is necessary for sign consistency for sparse concen- 
tration matrices. Let Us{c) denote the class of concentration matrices defined 
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in (1) and (2), satisfying assumption (5) and 6 > cM p ^ (log p)/n. We can 
show that there exists some constant c\ > such that for all < c < c\ , 

lim inf sup P(sign(S) / sign(S*)) > 0, 

similar to Cai, Liu and Zhou (2012). 

2.2. Rank Consistency Procedure of L* . In this section we propose a 
procedure to estimate L* and its rank. We note that with high probability 
T,q is invertible, then define L = (£ ) _1 — S, where S is defined in (4). 
Denote the eigen-decomposition of L by J2 P i=i ^i(L)vivf , and let Aj(L) = 

Xi(L)l{Xi(L) > C3v/f}> where constant C3 will be specified later. Define 

L = Yli=i \{I j ) v i v J ■ The following theorem shows that estimator L is a 
consistent estimator of L* under the spectral norm and with high probability 
rank(L*) = rank(L). 

Theorem 2. Under the conditions in Theorem 1, we assume that 



\ -< ~ i „ and M?s (p)< 



n ~ 16\/2M 2 p V logp" 

Then there exists some constant C3 such that 

\\L-L*\\<C Z J?- 
V n 

with probability greater than 1 — 2e~ p — C s p~ e . Hence, if a > 2Cs^J^, we 
have rank(L*) = rank(L) with high probability. 

Proof. From Corollary 5.5 of the paper and our assumption on the 
sample size, we have 

I So - s oll > ^^ M \[^~) - 2ex P(-p)- 

Note that A min (£ ) > 1/M, and VV28mJz < 1/(2M) under the assump- 
tion (12), then A m i n (E ) > 1/(2M) with high probability, which yields the 
same rate of convergence for the concentration matrix, since 

||(E&r 1 -(E5)- 1 ||<||(E2,)- 1 ||||(ES)- 1 ||||E2,-E5|| 

(13) 

< 2M 2 v / 128A/a /- = 16\/2M 3 



V n V n 

From Theorem 1 we know 

sign(S) = sign(S*) and \\S - S*^ < 9M p T n 
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with probability greater than 1 — C s p~ e . Since ||B|| < ||-B||i_n for any sym- 
metric matrix B, we then have 



(14) \\S - 5*|| < \\S - S*\\i^n < s (p)9M p r n = 9CiM^ (p) 

* V n 

Equations (13) and (14), together with the assumption MpSo(p) < y^^, 
imply 

\\L-L*\\ < ||(S5)- 1 -(SS)- 1 || + \\S -S*\\ 

< WV2M\[V +90^80^x1^ < C 3X [P 
V n ' V n \ n 

with probability greater than 1 — 2e~ p — C s p~ & . □ 

Remark 3. We should emphasize the fact that in order to consistently 
estimate the rank of L* we need only that a > 26*3^/^, which is smaller 
than Sq(p)a/ - required in the paper (see the footnote for Corollary 4.2), as 



long as MpSo(p) < J j^^. In particular, we don't explicitly constrain the 
rank ro(p). One special case is that M p is constant and so(p) x p 1 / 2 ~ T for 
some small r > 0, for which our requirement is but the assumption in 

the paper is at an order of £> 3 ( 1 / 2 ~ T ) , / £ 



Remark 4. Let Ul{c) denote the class of concentration matrices defined 
in (1), (2) and (3), satisfying assumptions (12), (5) and a > C\J~^- We can 
show that there exists some constant C2 > such that for all < c < C2 , 

lim inf sup P(rank(L) ^ rank(L*)) > 0. 

n ^°° (S,L)U L (c) 

The proof of this lower bound is based on a modification of a lower bound 
argument in a personal communication of T. Tony Cai (2011). 

3. Concluding remarks and further questions. In this discussion we at- 
tempt to understand optimalities of results in the present paper by studying 
a relatively simple model. Our preliminary analysis seems to indicate that 
their results in this paper are suboptimal. In particular, we tend to conclude 
that assumptions on 8 and a in the paper can be potentially very much 
weakened. However, it is not clear to us whether the suboptimality is due 
to the methodology or just its theoretical analysis. We want to emphasize 
that the preliminary results in this discussion can be strengthened, but for 
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the purpose of simplicity of the discussion we choose to present weaker but 
simpler results to hopefully shed some light on understanding optimalities 
in estimation. 
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